What the heck is an xrange?

Pop quiz: how would you implement Python’s xrange (known in Python 3.x as range) in Python and without making a list or any other sequence type?

If you answered “with a generator,” you’d be wrong! And it’s likely because you’ve only ever used an xrange in code like:

for i in xrange(10):
    # do something 10 times

In this case (where the xrange is only used as an “argument” to a for loop), a generator would probably suffice. But, in fact, xrange is implemented as an object, not a function or generator. Why? Let’s find out.

The Sequence Protocol

The range builtin returns a list, which (as the de facto sequence type) implements all the sequence protocol methods. xrange was intended to replace range (and did in Python 3.x), so it, too, must support all that a sequence supports.

So, what does a sequence support? Sequences have a finite, ordered bunch of elements, which means they have a length (they support len), they support indexed access (with the [] operator), and most importantly, they can be iterated — in forward order with a for loop, and in reverse order with the help of the reversed builtin. These behaviors are all implemented by way of “dunder” methods, and typically accessed using language syntax (as with for ... in) or builtins (as with len).

Sequences also support two instance methods: index, which returns the 0-based position in the sequence of the first occurrence of the argument, and count, which returns the number of occurrences of the argument.

Perhaps surprisingly, we can actually implement all of this functionality with constant-time and constant-space operations, using only the bounds of the xrange.

Building an xrange

As a quick refresher, both range and xrange take between 1 and 3 arguments. A single argument defines the length of the sequence (with an implicit starting value of 0). Two arguments define the starting point and ending point (actually, the second argument is the next integer after the last one that will be a part of the sequence). A third argument defines a step value, the number between each of the elements of the sequence. All arguments must be integers.

With that in hand, we can begin building our implementation of xrange:

class xrange(Sequence):
    """Pure-Python implementation of an ``xrange`` (aka ``range``
    in Python 3) object. See `the CPython documentation
    <http://docs.python.org/py3k/library/functions.html#range>`_
    for details.
    """

    def __init__(self, *args):
        if len(args) == 1:
            start, stop, step = 0, args[0], 1
        elif len(args) == 2:
            start, stop, step = args[0], args[1], 1
        elif len(args) == 3:
            start, stop, step = args
        else:
            raise TypeError('xrange() requires 1-3 int arguments')

        try:
            start, stop, step = int(start), int(stop), int(step)
        except ValueError:
            raise TypeError('an integer is required')


        if step == 0:
            raise ValueError('xrange() arg 3 must not be zero')
        elif step < 0:
            stop = min(stop, start)
        else:
            stop = max(stop, start)

        self._start = start
        self._stop = stop
        self._step = step
        self._len = (stop - start) // step + bool((stop - start) % step)

Update: thanks to sltkr for a correction to the _len calculation on Hacker News.

Most of this is error checking and adjustment of values for impossible sequences (like a sequence starting at 0 and going to 10 stepping by -1; in these cases, we collapse the xrange to a zero-length sequence with the same start value).

Importantly, since the entire sequence is “known” at initialization time, and is immutable, I precompute the length of the sequence, which is useful in the implementation of several of the methods, as we will see.

With our computed _start, _stop, _step, and _len attributes in hand, we can implement equality, representation, and length trivially:

def __repr__(self):
    if self._start == 0 and self._step == 1:
        return 'xrange(%d)' % self._stop
    elif self._step == 1:
        return 'xrange(%d, %d)' % (self._start, self._stop)
    return 'xrange(%d, %d, %d)' % (self._start, self._stop, self._step)

def __eq__(self, other):
    return isinstance(other, xrange) and \
           self._start == other._start and \
           self._stop == other._stop and \
           self._step == other._step

def __len__(self):
    return self._len

Nothing really surprising here.

Next we’ll implement index, as it serves as a basis for several other methods.

To find the index of an element in an immaterial sequence, we have to do a little math. First, we compute the difference between the element we’re looking for, and the start value for the sequence. Next, we compute the quotient and remainder using the divmod builtin. If the remainder is 0, then the element may be a member of the sequence (it falls on one of the values that might be produced by adding some number of _steps to the _start). If the quotient is between 0 (inclusive) and the sequence’s length (exclusive), then this is definitely a member, and we can return its index; if not, we raise ValueError to indicate it was not found:

def index(self, value):
    """Return the 0-based position of integer `value` in
    the sequence this xrange represents."""
    diff = value - self._start
    quotient, remainder = divmod(diff, self._step)
    if remainder == 0 and 0 <= quotient < self._len:
        return abs(quotient)
    raise ValueError('%r is not in range' % value)

Note that we’re safe from ZeroDivisionError in the divmod call, since we’ve previously ensured that _step is not zero.

With this in hand, we can trivially implement __contains__ (the dunder method that is called by x in foo syntax for checking membership):

def __contains__(self, value):
    """Return ``True`` if the integer `value` occurs in
    the sequence this xrange represents."""
    try:
        self.index(value)
        return True
    except:
        return False

Implementing count is trivial, as well, since an element can only occur zero or one times in an xrange:

def count(self, value):
    """Return the number of ocurrences of integer `value`
    in the sequence this xrange represents."""
    return int(value in self)

Element access with the [] operator (implemented with the __getitem__ dunder method) is relatively easy, except for two complications: we have to support negative indexes, for sequence access from the end; and we have to support slice access.

The former is easy, as we can just add the length of the sequence to the index we’re attempting to access, and proceed with bounds-checking as usual. Since the behavior of slices of sequences is somewhat subtle, we implement it in a separate method for clarity:

def __getitem__(self, index):
    """Return the element at position ``index`` in the sequence
    this xrange represents, or raise :class:`IndexError` if the
    position is out of range."""
    if isinstance(index, slice):
        return self.__getitem_slice(index)
    if index < 0:
        # negative indexes access from the end
        index = self._len + index
    if index < 0 or index >= self._len:
        raise IndexError('xrange object index out of range')
    return self._start + index * self._step

def __getitem_slice(self, slce):
    """Return an xrange which represents the requested slce
    of the sequence represented by this xrange.
    """
    start, stop, step = slce.start, slce.stop, slce.step
    if step == 0:
        raise ValueError('slice step cannot be 0')

    start = start or self._start
    stop = stop or self._stop
    if start < 0:
        start = max(0, start + self._len)
    if stop < 0:
        stop = max(start, stop + self._len)

    if step is None or step > 0:
        return xrange(start, stop, step or 1)
    else:
        rv = reversed(self)
        rv._step = step
        return rv

Finally, we come to iteration, which is (I suspect) what most people use xrange objects for. Remember that xrange objects act like any other sequence — you can have multiple, independent iterators iterating the same underlying sequence, so we have to implement a separate object for the iterator. The implementation of the iterator is about as simple as you might expect: it tracks the last value it returned, and adds the _step value to that value, then returns it for each element it iterates:

class xrangeiterator(Iterator):
    """An iterator for an :class:`xrange`.
    """

    def __init__(self, xrangeobj):
        self._xrange = xrangeobj

        # Intialize the "last outputted value" to the value
        # just before the first value; this simplifies next()
        self._last = self._xrange._start - self._xrange._step
        self._count = 0

    def __iter__(self):
        """An iterator is already an iterator, so return ``self``.
        """
        return self

    def next(self):
        """Return the next element in the sequence represented
        by the xrange we are iterating, or raise StopIteration
        if we have passed the end of the sequence."""
        self._last += self._xrange._step
        self._count += 1
        if self._count > self._xrange._len:
            raise StopIteration()
        return self._last

Put it all together

I’ve published the code for the xrange object, along with a comprehensive test suite, to my GitHub.

Closing Thoughts

Python already has xrange (or, in Python 3, range), as a built-in, implemented efficiently in C, so re-implementing it in Python is largely a learning exercise, rather than code which you should actually use.

However, this implementation is superior to CPython’s implementation in two ways:

  1. I have backported support for slicing, index, and count from Python 3 to Python 2-compatible code; if these features are important and you need to support multiple versions of Python, this may be of use to you.

  2. CPython 2.x’s implementation of xrange does not implement __contains__, so the expression 1000 in xrange(1001) actually has to scan all 1000 elements in the xrange before it can determine if 1000 is a member of the sequence. This was corrected in CPython 3, but very long lists in CPython 2.x will have __contains__ performance characteristics more similar to lists than to 2.7 xranges. This implementation uses a constant-time algorithm to compute membership, so it may actually be faster than the CPython 2.x implementation for very large sequences.

python, internals

Your thoughts:

Very cool, but a few comments:

  1. If you look at the C implementation in 3.x, you’ll see it still falls back to the slow “iterate over everything” behaviour if you hand it a non-integer object for containment testing (including index() and count()). This is necessary in order to correctly take custom __eq__ implementations into account, as well as to avoid dodginess due to floating point dynamic range limitations. You can use operator.index() to implement a slightly more permissive (but otherwise equivalent) check-and-fallback for your version.

  2. Bare excepts are always dubious, since they can hide implementation errors. Use “except ValueError:“ in the __contains__ implementation.

  3. Your __getitem_slice implementation could probably be simplified a fair bit with a call to slce.indices(self._len)

— Nick Coghlan, 2012-06-19 12:18 am

Cool, thanks, Nick. I’ll have a look at those tomorrow.

— Dan Crosta, 2012-06-19 12:20 am

Regarding count() method, standard implementation has some fun built-in:

>>> class A(object):
...  def __eq__(self, other):
...    if other in (3,5):
...      return True
...    else:
...      return False
... 
>>> a = A()
>>> a == 5
True
>>> a == 3
True
>>> a == 4
False
>>> range(10).count(a)
2
— void, 2012-06-19 09:10 am

How do you create those nice typeset python snippets? hilite.me requires me to fiddle with tumblr css...

— cast42, 2012-06-19 10:10 am

@void Neat! I think that’s part of what Nick Coghlan was hinting at in his point #1. I’ll see if there’s a reasonably efficient way to implement that in my xrange

— Dan Crosta, 2012-06-20 12:01 am

@cast42 I use some custom code backing Pygments. You can see all the code for this blog in my GitHub

— Dan Crosta, 2012-06-20 12:02 am

You’re right, this is a good learning exercise :)

Forgive me for not having run your code to check this, but from what I can tell your slice implementation doesn’t seem to be quite right.

Seems to me that the step on the new xrange should be self._step * (step or 1), e.g. it should behave like:

>>> range(0,100,2)[20:40:3]
[40, 46, 52, 58, 64, 70, 76]
— Tom, 2012-06-21 10:57 pm

@Tom you’re right. The __getattr_slice implementation is definitely wrong, but I haven’t yet had time to review or update it. @asazernik on github submitted a pull request that partially fixes it, along similar lines to what you suggest, but I haven’t had time to fully implement his fix, either :–(

— Dan Crosta, 2012-06-22 02:49 pm

Could someone explain the use/purpose of self._len instead we can directly use self.len?

Thanks

— david, 2012-06-25 09:55 am

@david, self.len is not part of the public API of Sequences, so I chose to make it a private(ish) attribute rather than a public(ish) one. It is more or less directly exposed by len(your_xrange).

— Dan Crosta, 2012-06-28 02:59 pm

Thanks Dan. It was pleasure to learn from your article.

— Miro, 2012-07-25 01:51 pm

Comments are closed.