Web A/B Testing with Dabble
Thanks to a series of recent posts on the SvN blog, I've been thinking more about my little Python A/B testing framework, Dabble.
I built Dabble to A/B test (sometimes also called "split test") features on 10gen.com. Following the advice of a blog post I've since lost track of, Dabble configures A/B test parameters entirely in code, follows procedure for independent testing, and generally works without much of a hassle.
Dabble from 10,000 Feet
Aside from a little necessary configuration, Dabble is composed of two user-facing classes: ABTest
and ABParameter
. The former represents the thing you're testing -- for instance, the design of a signup page or a call-to-action form; the latter is where you actually implement the variants which make up the tested features.
Here's an example, using a class-based Python framework:
class Signup(page):
path = '/signup'
abest = ABTest('signup_button', ['red', 'green'], ['show', 'signup'])
color = ABParameter('signup_button', ['#FF0000', '#00FF00'])
def get(self):
# do some stuff
self.abtest.record('show')
return render('signup.html, color=self.color)
def post(self):
# validate the submission
# save the user's information
self.abtest.record('signup')
And in your template:
<form action="/signup" method="post">
<!-- form goes here -->
<input type="submit" style="background-color:{{color}}"/>
</form>
That's it.
Wait, what?
Yup, that's it. No if/else, no ugly duplication of code across templates. Just one template variable.
A/B testing is really that simple. Ruby's got 7 Minute Abs, and now Python's got 30 second Dabbles... or something... I guess I'll have to work on that.
Behind the Scenes
What magic makes this possible? First, let's take a look at the parent class for ABTest
and ABParameter
, aptly named AB
:
class AB(object):
# these are set by the configure() function
_id_provider = None
_storage = None
# track the number of alternatives for each
# named test; helps prevent errors where some
# parameters have more alts than others
__n_per_test = {}
def __init__(self, test_name, alternatives):
if test_name not in AB.__n_per_test:
AB.__n_per_test[test_name] = len(alternatives)
if len(alternatives) != AB.__n_per_test[test_name]:
raise Exception('Wrong number of alternatives')
self.test_name = test_name
self.alternatives = alternatives
@property
def identity(self):
return hash(self._id_provider.get_identity())
@property
def alternative(self):
alternative = self._storage.get_alternative(
self.identity, self.test_name)
if alternative is None:
alternative = random.randrange(
len(self.alternatives))
self._storage.set_alternative(
self.identity, self.test_name, alternative)
return alternative
The alternative
property here does most of the heavy lifting, it consults with the storage object (which, in 10gen's case, is backed by MongoDB, of course) to find out if the current user already has been assigned an alternative; if not, it creates and stores one for the user. The identity
property consults an IdentityProvider
to identify the user; this might be implemented with cookies for sites that don't have logins, or might simply be a user ID for sites that do.
Next up is ABParameter
:
class ABParameter(AB):
# a descriptor object which can be used to vary parameters
# in a class definition according to A/B testing rules.
#
# each viewer who views the given class will always be
# consistently shown the Nth choice from among the
# alternatives, even between different attributes in the
# class, so long as the name is the same between them
def __get__(self, instance, owner):
return self.alternatives[self.alternative]
Python will call __get__
with a reference to the instance ("self
") and the owning class (in our example, the Signup
class), and return the value returned by this function when the attribute to which this descriptor is bound is accessed.
The storage backends (currently MongoDB and filesystem are supported) are responsible for storing results from the tests, test configurations, and user-to-alternative assignments. The code there is not particularly exciting, so I won't show it here.
Collecting Results
Dabble's storage object has a report
method which summarizes the record
ed results in a dictionary like:
{
'test_name': 'signup_button',
'results': [
{
'alternative': 'red',
'funnel': [{
'stage': ('show', 'signup'),
'attempted': 1813,
'converted': 18
}],
},
{
'alternative': 'green',
'funnel': [{
'stage': ('show', 'signup'),
'attempted': 1838,
'converted': 32
}]
}
]
}
Each alternative's results has a funnel
element, which shows the number of users who attempted (i.e. the first step was recorded) and converted (i.e. the second step was recorded) for each successive pair of steps as defined in the ABTest
. Here we can see that about the same number of people attempted the show-signup stage for each alternative, but nearly twice as many people signed up with the green button.
What's Next?
Here are a few areas where I'd like to improve Dabble:
- Support non class-based views web frameworks (Django, Flask, etc)
- Build
IdentityProvider
support for common web frameworks - DRY reporting (currently the two backends duplicate much of the same functionality in the
report
method) - Support other back-ends for storing results
You can help! fork the repository on GitHub, use it in your application, report or fix bugs, contribute improvements, etc.