# What the heck is an xrange?

Pop quiz: how would you implement Python’s `xrange`

(known in Python 3.x as `range`

) in Python and without making a list or any other sequence type?

If you answered “with a generator,” you’d be wrong! And it’s likely because you’ve only ever used an `xrange`

in code like:

```
for i in xrange(10):
# do something 10 times
```

In this case (where the `xrange`

is only used as an “argument” to a `for`

loop), a generator would probably suffice. But, in fact, `xrange`

is implemented as an object, not a function or generator. Why? Let’s find out.

# The Sequence Protocol

The `range`

builtin returns a list, which (as the *de facto* sequence type) implements all the sequence protocol methods. `xrange`

was intended to replace `range`

(and did in Python 3.x), so it, too, must support all that a sequence supports.

So, what does a sequence support? Sequences have a finite, ordered bunch of elements, which means they have a length (they support `len`

), they support indexed access (with the `[]`

operator), and most importantly, they can be iterated — in forward order with a `for`

loop, and in reverse order with the help of the `reversed`

builtin. These behaviors are all implemented by way of “dunder” methods, and typically accessed using language syntax (as with `for ... in`

) or builtins (as with `len`

).

Sequences also support two instance methods: `index`

, which returns the 0-based position in the sequence of the first occurrence of the argument, and `count`

, which returns the number of occurrences of the argument.

Perhaps surprisingly, we can actually implement all of this functionality with constant-time and constant-space operations, using only the bounds of the `xrange`

.

# Building an `xrange`

As a quick refresher, both `range`

and `xrange`

take between 1 and 3 arguments. A single argument defines the length of the sequence (with an implicit starting value of 0). Two arguments define the starting point and ending point (actually, the second argument is the next integer *after* the last one that will be a part of the sequence). A third argument defines a step value, the number between each of the elements of the sequence. All arguments must be integers.

With that in hand, we can begin building our implementation of `xrange`

:

```
class xrange(Sequence):
"""Pure-Python implementation of an ``xrange`` (aka ``range``
in Python 3) object. See `the CPython documentation
<http://docs.python.org/py3k/library/functions.html#range>`_
for details.
"""
def __init__(self, *args):
if len(args) == 1:
start, stop, step = 0, args[0], 1
elif len(args) == 2:
start, stop, step = args[0], args[1], 1
elif len(args) == 3:
start, stop, step = args
else:
raise TypeError('xrange() requires 1-3 int arguments')
try:
start, stop, step = int(start), int(stop), int(step)
except ValueError:
raise TypeError('an integer is required')
if step == 0:
raise ValueError('xrange() arg 3 must not be zero')
elif step < 0:
stop = min(stop, start)
else:
stop = max(stop, start)
self._start = start
self._stop = stop
self._step = step
self._len = (stop - start) // step + bool((stop - start) % step)
```

*Update: thanks to sltkr for a correction to the _len calculation on Hacker News.*

Most of this is error checking and adjustment of values for impossible sequences (like a sequence starting at 0 and going to 10 stepping by -1; in these cases, we collapse the `xrange`

to a zero-length sequence with the same start value).

Importantly, since the entire sequence is “known” at initialization time, and is immutable, I precompute the length of the sequence, which is useful in the implementation of several of the methods, as we will see.

With our computed `_start`

, `_stop`

, `_step`

, and `_len`

attributes in hand, we can implement equality, representation, and length trivially:

```
def __repr__(self):
if self._start == 0 and self._step == 1:
return 'xrange(%d)' % self._stop
elif self._step == 1:
return 'xrange(%d, %d)' % (self._start, self._stop)
return 'xrange(%d, %d, %d)' % (self._start, self._stop, self._step)
def __eq__(self, other):
return isinstance(other, xrange) and \
self._start == other._start and \
self._stop == other._stop and \
self._step == other._step
def __len__(self):
return self._len
```

Nothing really surprising here.

Next we’ll implement `index`

, as it serves as a basis for several other methods.

To find the index of an element in an immaterial sequence, we have to do a little math. First, we compute the difference between the element we’re looking for, and the start value for the sequence. Next, we compute the quotient and remainder using the `divmod`

builtin. If the remainder is 0, then the element may be a member of the sequence (it falls on one of the values that might be produced by adding some number of `_step`

s to the `_start`

). If the quotient is between 0 (inclusive) and the sequence’s length (exclusive), then this is definitely a member, and we can return its index; if not, we raise `ValueError`

to indicate it was not found:

```
def index(self, value):
"""Return the 0-based position of integer `value` in
the sequence this xrange represents."""
diff = value - self._start
quotient, remainder = divmod(diff, self._step)
if remainder == 0 and 0 <= quotient < self._len:
return abs(quotient)
raise ValueError('%r is not in range' % value)
```

Note that we’re safe from `ZeroDivisionError`

in the `divmod`

call, since we’ve previously ensured that `_step`

is not zero.

With this in hand, we can trivially implement `__contains__`

(the dunder method that is called by `x in foo`

syntax for checking membership):

```
def __contains__(self, value):
"""Return ``True`` if the integer `value` occurs in
the sequence this xrange represents."""
try:
self.index(value)
return True
except:
return False
```

Implementing `count`

is trivial, as well, since an element can only occur zero or one times in an `xrange`

:

```
def count(self, value):
"""Return the number of ocurrences of integer `value`
in the sequence this xrange represents."""
return int(value in self)
```

Element access with the `[]`

operator (implemented with the `__getitem__`

dunder method) is relatively easy, except for two complications: we have to support negative indexes, for sequence access from the end; and we have to support slice access.

The former is easy, as we can just add the length of the sequence to the index we’re attempting to access, and proceed with bounds-checking as usual. Since the behavior of slices of sequences is somewhat subtle, we implement it in a separate method for clarity:

```
def __getitem__(self, index):
"""Return the element at position ``index`` in the sequence
this xrange represents, or raise :class:`IndexError` if the
position is out of range."""
if isinstance(index, slice):
return self.__getitem_slice(index)
if index < 0:
# negative indexes access from the end
index = self._len + index
if index < 0 or index >= self._len:
raise IndexError('xrange object index out of range')
return self._start + index * self._step
def __getitem_slice(self, slce):
"""Return an xrange which represents the requested slce
of the sequence represented by this xrange.
"""
start, stop, step = slce.start, slce.stop, slce.step
if step == 0:
raise ValueError('slice step cannot be 0')
start = start or self._start
stop = stop or self._stop
if start < 0:
start = max(0, start + self._len)
if stop < 0:
stop = max(start, stop + self._len)
if step is None or step > 0:
return xrange(start, stop, step or 1)
else:
rv = reversed(self)
rv._step = step
return rv
```

Finally, we come to iteration, which is (I suspect) what most people use `xrange`

objects for. Remember that `xrange`

objects act like any other sequence — you can have multiple, independent iterators iterating the same underlying sequence, so we have to implement a separate object for the iterator. The implementation of the iterator is about as simple as you might expect: it tracks the last value it returned, and adds the `_step`

value to that value, then returns it for each element it iterates:

```
class xrangeiterator(Iterator):
"""An iterator for an :class:`xrange`.
"""
def __init__(self, xrangeobj):
self._xrange = xrangeobj
# Intialize the "last outputted value" to the value
# just before the first value; this simplifies next()
self._last = self._xrange._start - self._xrange._step
self._count = 0
def __iter__(self):
"""An iterator is already an iterator, so return ``self``.
"""
return self
def next(self):
"""Return the next element in the sequence represented
by the xrange we are iterating, or raise StopIteration
if we have passed the end of the sequence."""
self._last += self._xrange._step
self._count += 1
if self._count > self._xrange._len:
raise StopIteration()
return self._last
```

# Put it all together

I’ve published the code for the `xrange`

object, along with a comprehensive test suite, to my GitHub.

# Closing Thoughts

Python already has `xrange`

(or, in Python 3, `range`

), as a built-in, implemented efficiently in C, so re-implementing it in Python is largely a learning exercise, rather than code which you should actually use.

However, this implementation is superior to CPython’s implementation in two ways:

I have backported support for slicing,

`index`

, and`count`

from Python 3 to Python 2-compatible code; if these features are important and you need to support multiple versions of Python, this may be of use to you.CPython 2.x’s implementation of

`xrange`

does not implement`__contains__`

, so the expression`1000 in xrange(1001)`

actually has to scan all 1000 elements in the`xrange`

before it can determine if 1000 is a member of the sequence. This was corrected in CPython 3, but very long lists in CPython 2.x will have`__contains__`

performance characteristics more similar to`list`

s than to 2.7`xrange`

s. This implementation uses a constant-time algorithm to compute membership, so it may actually be faster than the CPython 2.x implementation for very large sequences.

## Your thoughts:

Very cool, but a few comments:

If you look at the C implementation in 3.x, you’ll see it still falls back to the slow “iterate over everything” behaviour if you hand it a non-integer object for containment testing (including

`index()`

and`count()`

). This is necessary in order to correctly take custom`__eq__`

implementations into account, as well as to avoid dodginess due to floating point dynamic range limitations. You can use`operator.index()`

to implement a slightly more permissive (but otherwise equivalent) check-and-fallback for your version.Bare excepts are always dubious, since they can hide implementation errors. Use “

`except ValueError:`

“ in the`__contains__`

implementation.Your

`__getitem_slice`

implementation could probably be simplified a fair bit with a call to`slce.indices(self._len)`

Cool, thanks, Nick. I’ll have a look at those tomorrow.

Regarding count() method, standard implementation has some fun built-in:

How do you create those nice typeset python snippets? hilite.me requires me to fiddle with tumblr css...

@void Neat! I think that’s part of what Nick Coghlan was hinting at in his point #1. I’ll see if there’s a reasonably efficient way to implement that in my

`xrange`

@cast42 I use some custom code backing Pygments. You can see all the code for this blog in my GitHub

You’re right, this is a good learning exercise :)

Forgive me for not having run your code to check this, but from what I can tell your slice implementation doesn’t seem to be quite right.

Seems to me that the step on the new xrange should be

`self._step * (step or 1)`

, e.g. it should behave like:@Tom you’re right. The

`__getattr_slice`

implementation is definitely wrong, but I haven’t yet had time to review or update it. @asazernik on github submitted a pull request that partially fixes it, along similar lines to what you suggest, but I haven’t had time to fully implement his fix, either :–(Could someone explain the use/purpose of

`self._len`

instead we can directly use`self.len`

?Thanks

@david,

`self.len`

is not part of the public API of`Sequence`

s, so I chose to make it a private(ish) attribute rather than a public(ish) one. It is more or less directly exposed by`len(your_xrange)`

.Thanks Dan. It was pleasure to learn from your article.

## Comments are closed.