|Programmer's Python Data - Iterables|
|Written by Mike James|
|Monday, 15 May 2023|
Page 2 of 3
Sequences Are Iterables
Another way to see the difference between a sequence and an iterable is that a sequence has to have a __getitem__ method. As explained in Chapter 15, __getitem__ is called when indexing is used so myObject[key] translates to __getitem__(key). An object is a sequence when it has a __getitem__ method that only accepts an integer key, i.e. an index, that starts from 0 and runs to the length of the sequence.
A sequence does not have to have an __iter__ method and if it doesn’t it isn’t strictly an iterable. However, it is easy to see that if you have a __getitem__ method that uses an integer index you can easily create an iterator:
class myIterator: def __init__(self): self.index=0 def __iter__(self): return self def __next__(self): res = seq[index] index+=1 return res
You can see that all that happens is that the iterator keeps track of the current element, adding one after each call, and then uses indexing to return the next element, using __getitem__ to find the indexed item. This is such an easy transformation that the system will do this for you, even if the class doesn’t have an explicit __iter__ defined. This is how and why you can use a sequence in a for loop.
Notice that this also means that if you want to test to see if a class is iterable you have to test that it has a defined __iter__ method. If it does it is a true iterable. However, if it has a __getitem__ while it may not be a true iterable it can still be used as one. If a class has both an __iter__ method and a __getitem__ method, the system will use the __iter__ method in preference.
In most cases you want to terminate the iteration when the elements run out. You can do this by raising the StopIteration exception:
class randItIterator(): def __init__(self): self.n=0 def __next__(self): self.n=self.n+1 if self.n>20: raise StopIteration return random.random()
This version of the iterator provides 20 random numbers and then stops. If used in the for loop it now prints 20 values. The numbers object is like a container with just 20 random numbers.
In nearly all practical cases the iteration stops when all of the elements in the container have been iterated.
A More Realistic Example
The example so far has been simple enough for us not to worry about the connection between the iterable and the iterator. In most cases the iterable has data stored in it that the iterator has to access. Let’s see how this is achieved.
First we need an iterable that stores some data – in this case a set of random values stored in a list:
class randList(object): def __init__(self): self.data =  for i in range(5): self.data.append(random.random())
This just stores five random values in a list. Now the iterator for this iterable has to have access to the list:
class randListIterator(): def __init__(self,rndList): self._n = 0 self._data = rndList.data
Now the iterator needs to set up some details of what is to be iterated. It stores a reference to the list with the data and a counter to indicate where it has got to in the list. The __next__ method is much the same as before only now it has to check and update the value of _n.
To make the connection, the iterable has to pass a reference to itself when it creates the iterator:
def __iter__(self): return randListIterator(self)
Now we can write a for loop like:
numbers = randList() for x in numbers: print(x) for x in numbers: print(x)
which displays the same five random numbers.
The complete program is:
import random class randListIterator(): def __init__(self,rndList): self._n = 0 self._data = rndList.data def __next__(self): if self._n >= len(self._data): raise StopIteration next = self._data[self._n] self._n = self._n+1 return next
One thing that often puzzles programmers when they first meet the iterable is why does it return an iterator that has a __next__ method, why not just return a __next__ method directly? After all, creating an object of the sort given in the previous section is moderately complex.
You can avoid creating a new object and allow the iterable to return a __next__, but it would allow only a single iteration to be in progress at any one time. For example, randList given in the previous section creates a new iterator object each time __iter__ is called. The iterator object has to keep track of the state of the iteration so that a call to __next__ returns the correct next element. In other words, the iterator object keeps track of a single iteration and thus there can be any number of iterations in action at any given time. Consider, for example:
numbers = randList() for x in numbers: for y in numbers: print(x,y)
Each iteration has its own iterator object and you see 25 numbers, the same five random numbers displayed five times.
Compare this to the frequently encountered attempt to avoid having to create a separate class for the iterator:
class randList(object): def __init__(self): self.data =  for i in range(5): self.data.append(random.random()) def __iter__(self): self._n = 0 return (self) def __next__(self): if self._n >= len(self.data): raise StopIteration next = self.data[self._n] self._n = self._n+1 return next
You will often see this single-class approach to implementing an iterable in examples – it isn’t the way to do the job. In this case the iterable has an __iter__ method that returns a reference to itself – randList is both the iterable and its own iterator. It has a __next__ method so it can act as an iterator and it is defined to return the random numbers stored in data as before. It also has to create _n to keep track of where the iteration has got to but apart from this the logic is the same. Notice that now that __next__ is part of the iterable class it can directly access data.
If you try it out in a single for loop everything works. It works for many situations in which only a single iteration is active at any given time. If you try it out in a nested for loop, or in any situation where more than one iterator is required, you will find it doesn’t work:
numbers = randList() for x in numbers: for y in numbers: print(x,y)
What happens is that when the first for loop starts it calls __iter__ and gets a reference to numbers. This is fine and the for loop calls __next__ to get the first random number and sets x to reference it. When the second for inner loop starts it also calls __iter__ but in this case the value of n is initialized for a new iteration of the list. This results in y being set to the first element as it should be. Then the inner loop continues and called __next__ to step through the elements of the list until it reaches the end. At this point the outer loop continues but now the value of _n is 5, the value it reached at the end of the inner loop and so the outer loop comes to an end. This means that after one iteration of the outer loop it stops rather than continuing for five iterations. As there is only one iterator, you only see five pairs of random numbers.
The point is that you need a new iterator for each independent iteration through the values in the container. The only alternative is to implement a single iterator to keep track of multiple iterations and this is a hard thing to do correctly.
|Last Updated ( Monday, 15 May 2023 )|