Ensuring Write-Your-Own-Reads Consistency in MongoDB
In a question on StackOverflow a few days ago, a user was asking how to ensure that a document hasn't changed between when a client read the document and wrote to it. If user A reads the document and makes some changes (through a web form, for instance), the change should be accepted if and only if no other user B has updated the document since when user A read the document.
MongoDB doesn't support transactions, and even if it did, they wouldn't help in this case. An assumption underlying the question is that the time between any given user's reads and writes is long -- otherwise explicit/pessimistic locking would be the simplest solution -- in which case holding a transaction open on a traditional database server would be prohibitively costly (in terms of resource usage and performance).
The solution is to leverage MongoDB's atomic update semantics with an optimistic concurrency solution. This comprises four basic steps: 1. read a document; 2. modify the document (i.e. present it to the user in a web form); 3. validate that the document hasn't changed; 4. commit or abandon the user's update. For anyone who's used a source code version control system before, these steps should be familiar (i.e. pull, work locally, commit, and push for git users).
How to Version Your Documents
In order to do steps 1 and 3 efficiently, add a version field to the document. In most cases, this can be an always-increasing number, which MongoDB allows us to easily modify using $inc
. When we create a document for the first time, add a field version
and set its value to 0 (or 1 if you prefer):
{ _id: ObjectId(...),
title: "Ensuring Write-Your-Own-Reads Consistency in MongoDB",
tags: ["mongodb"],
body: "In a question on StackOverflow...",
version: 0
}
Next we load the document and display a form to the user, being sure to hold on to the version in a hidden field. Once the form is submitted, we calculate the changes the user made (or simply prepare to overwrite the document with the new values), but we issue our update using the previous version number:
result = db.blogposts.update(
{'_id': ObjectId(...), 'version': 0},
{'$set': {
# key-value pairs from the form,
# excluding the "version" field
},
'$inc': {
'version': 1}
},
safe=True)
By using a safe=True
write, PyMongo (and the other MongoDB drivers) will batch a call to getLastError
with our update. The result of getLastError
will indicate how many documents were updated -- in our case, since we are not using a multi-update, we expect that number to be 1 in the happy case (i.e. no one else updated the document in the meantime) or 0 if a conflict was detected (i.e. the version is no longer 0). The getLastError
output will look like this:
{'updatedExisting': True, 'connectionId': 112, 'ok': 1.0, 'err': None, 'n': 1}
The n
field in the getLastError
output is the one we want to look at -- if it's 1, then the document was updated and will now have a version number of 1; if it's 0, then there was a conflicting update.
What to do in the case of conflicting updates is up to the application, MongoDB doesn't handle this for you at all (in fact, MongoDB isn't even aware of the fact that you're trying to do optimistic locking at all). A few common solutions are:
- Show the user the form again, with a message indicating that another user has updated the document, and asking the user to handle the error
- If possible, handle the conflict automatically in your application (this might require having tracked all the values of the previous version of the form, in order to detect which values have changed and what the delta is)
- Throw up your arms, admit that programming is hard, and go shopping.
OK, maybe not the last one :)