Pop quiz: how would you implement Python's
xrange (known in Python 3.x as
range) in Python and without making a list or any other sequence type?
If you answered "with a generator," you'd be wrong! And it's likely because you've only ever used an
xrange in code like:
for i in xrange(10): # do something 10 times
In this case (where the
xrange is only used as an "argument" to a
for loop), a generator would probably suffice. But, in fact,
xrange is implemented as an object, not a function or generator. Why? Let's find out.
The Sequence Protocol
range builtin returns a list, which (as the de facto sequence type) implements all the sequence protocol methods.
xrange was intended to replace
range (and did in Python 3.x), so it, too, must support all that a sequence supports.
So, what does a sequence support? Sequences have a finite, ordered bunch of elements, which means they have a length (they support
len), they support indexed access (with the
 operator), and most importantly, they can be iterated -- in forward order with a
for loop, and in reverse order with the help of the
reversed builtin. These behaviors are all implemented by way of "dunder" methods, and typically accessed using language syntax (as with
for ... in) or builtins (as with
Sequences also support two instance methods:
index, which returns the 0-based position in the sequence of the first occurrence of the argument, and
count, which returns the number of occurrences of the argument.
Perhaps surprisingly, we can actually implement all of this functionality with constant-time and constant-space operations, using only the bounds of the
As a quick refresher, both
xrange take between 1 and 3 arguments. A single argument defines the length of the sequence (with an implicit starting value of 0). Two arguments define the starting point and ending point (actually, the second argument is the next integer after the last one that will be a part of the sequence). A third argument defines a step value, the number between each of the elements of the sequence. All arguments must be integers.
With that in hand, we can begin building our implementation of
class xrange(Sequence): """Pure-Python implementation of an ``xrange`` (aka ``range`` in Python 3) object. See `the CPython documentation <http://docs.python.org/py3k/library/functions.html#range>`_ for details. """ def __init__(self, *args): if len(args) == 1: start, stop, step = 0, args, 1 elif len(args) == 2: start, stop, step = args, args, 1 elif len(args) == 3: start, stop, step = args else: raise TypeError('xrange() requires 1-3 int arguments') try: start, stop, step = int(start), int(stop), int(step) except ValueError: raise TypeError('an integer is required') if step == 0: raise ValueError('xrange() arg 3 must not be zero') elif step < 0: stop = min(stop, start) else: stop = max(stop, start) self._start = start self._stop = stop self._step = step self._len = (stop - start) // step + bool((stop - start) % step)
Most of this is error checking and adjustment of values for impossible sequences (like a sequence starting at 0 and going to 10 stepping by -1; in these cases, we collapse the
xrange to a zero-length sequence with the same start value).
Importantly, since the entire sequence is "known" at initialization time, and is immutable, I precompute the length of the sequence, which is useful in the implementation of several of the methods, as we will see.
With our computed
_len attributes in hand, we can implement equality, representation, and length trivially:
def __repr__(self): if self._start == 0 and self._step == 1: return 'xrange(%d)' % self._stop elif self._step == 1: return 'xrange(%d, %d)' % (self._start, self._stop) return 'xrange(%d, %d, %d)' % (self._start, self._stop, self._step) def __eq__(self, other): return isinstance(other, xrange) and \ self._start == other._start and \ self._stop == other._stop and \ self._step == other._step def __len__(self): return self._len
Nothing really surprising here.
Next we'll implement
index, as it serves as a basis for several other methods.
To find the index of an element in an immaterial sequence, we have to do a little math. First, we compute the difference between the element we're looking for, and the start value for the sequence. Next, we compute the quotient and remainder using the
divmod builtin. If the remainder is 0, then the element may be a member of the sequence (it falls on one of the values that might be produced by adding some number of
_steps to the
_start). If the quotient is between 0 (inclusive) and the sequence's length (exclusive), then this is definitely a member, and we can return its index; if not, we raise
ValueError to indicate it was not found:
def index(self, value): """Return the 0-based position of integer `value` in the sequence this xrange represents.""" diff = value - self._start quotient, remainder = divmod(diff, self._step) if remainder == 0 and 0 <= quotient < self._len: return abs(quotient) raise ValueError('%r is not in range' % value)
Note that we're safe from
ZeroDivisionError in the
divmod call, since we've previously ensured that
_step is not zero.
With this in hand, we can trivially implement
__contains__ (the dunder method that is called by
x in foo syntax for checking membership):
def __contains__(self, value): """Return ``True`` if the integer `value` occurs in the sequence this xrange represents.""" try: self.index(value) return True except: return False
count is trivial, as well, since an element can only occur zero or one times in an
def count(self, value): """Return the number of ocurrences of integer `value` in the sequence this xrange represents.""" return int(value in self)
Element access with the
 operator (implemented with the
__getitem__ dunder method) is relatively easy, except for two complications: we have to support negative indexes, for sequence access from the end; and we have to support slice access.
The former is easy, as we can just add the length of the sequence to the index we're attempting to access, and proceed with bounds-checking as usual. Since the behavior of slices of sequences is somewhat subtle, we implement it in a separate method for clarity:
def __getitem__(self, index): """Return the element at position ``index`` in the sequence this xrange represents, or raise :class:`IndexError` if the position is out of range.""" if isinstance(index, slice): return self.__getitem_slice(index) if index < 0: # negative indexes access from the end index = self._len + index if index < 0 or index >= self._len: raise IndexError('xrange object index out of range') return self._start + index * self._step def __getitem_slice(self, slce): """Return an xrange which represents the requested slce of the sequence represented by this xrange. """ start, stop, step = slce.start, slce.stop, slce.step if step == 0: raise ValueError('slice step cannot be 0') start = start or self._start stop = stop or self._stop if start < 0: start = max(0, start + self._len) if stop < 0: stop = max(start, stop + self._len) if step is None or step > 0: return xrange(start, stop, step or 1) else: rv = reversed(self) rv._step = step return rv
Finally, we come to iteration, which is (I suspect) what most people use
xrange objects for. Remember that
xrange objects act like any other sequence -- you can have multiple, independent iterators iterating the same underlying sequence, so we have to implement a separate object for the iterator. The implementation of the iterator is about as simple as you might expect: it tracks the last value it returned, and adds the
_step value to that value, then returns it for each element it iterates:
class xrangeiterator(Iterator): """An iterator for an :class:`xrange`. """ def __init__(self, xrangeobj): self._xrange = xrangeobj # Intialize the "last outputted value" to the value # just before the first value; this simplifies next() self._last = self._xrange._start - self._xrange._step self._count = 0 def __iter__(self): """An iterator is already an iterator, so return ``self``. """ return self def next(self): """Return the next element in the sequence represented by the xrange we are iterating, or raise StopIteration if we have passed the end of the sequence.""" self._last += self._xrange._step self._count += 1 if self._count > self._xrange._len: raise StopIteration() return self._last
Put it all together
I've published the code for the
xrange object, along with a comprehensive test suite, to my GitHub.
Python already has
xrange (or, in Python 3,
range), as a built-in, implemented efficiently in C, so re-implementing it in Python is largely a learning exercise, rather than code which you should actually use.
However, this implementation is superior to CPython's implementation in two ways:
I have backported support for slicing,
countfrom Python 3 to Python 2-compatible code; if these features are important and you need to support multiple versions of Python, this may be of use to you.
CPython 2.x's implementation of
xrangedoes not implement
__contains__, so the expression
1000 in xrange(1001)actually has to scan all 1000 elements in the
xrangebefore it can determine if 1000 is a member of the sequence. This was corrected in CPython 3, but very long lists in CPython 2.x will have
__contains__performance characteristics more similar to
lists than to 2.7
xranges. This implementation uses a constant-time algorithm to compute membership, so it may actually be faster than the CPython 2.x implementation for very large sequences.