Battling the Fallacies of Distributed Computing with RavenDB

Recently, I deployed some code that had the following requirements:

  • When post {x} is first published in CMS A, import a summary of {x} into CMS B
  • When {x} is subsequently updated and re-published in CMS A, do nothing

Seemed pretty simple.  Due to the limited API support in CMS B, I used RavenDB to maintain a record of posts I had already imported from CMS A to CMS B in order to honor the second requirement.

Worked on my machine

Immediately after deploying, I basked for about 30 seconds in the praise from our editors. Moments later, I started receiving reports of duplicate posts showing up in CMS B. I was flabbergasted. I had been careful to handle the dupe scenario in code.

I checked the code again. By design, the code prevents dupes from being created in CMS B…. unless the duplicates arrived less than a few milliseconds apart. Fail.

When I looked at the server logs, the duplicate notifications were indeed happening less than a millisecond apart. For a few minutes, I thought about how I might prevent the duplicate publish notifications. Ultimately, I embraced the first two fallacies of distribute computing instead:

 

  1. The network is reliable
  2. Latency is zero

 

Duplicate notifications arriving a couple of milliseconds apart are a fact of life. Deal with it.

Raven etags and concurrency control to the rescue

One of my favorite aspects of Raven is that it’s ACID when you need it, BASE when you don’t. Here’s how we made it really ACID-y to solve the duplicate import problem:

  1. When a notification comes in, check Raven to see if we’ve already imported the post.
  2. If the post has never been seen, create a new document in Raven with a null etag and using optimistic concurrency. Used this way, the Raven client will throw an exception if anyone else tries to create the same document. Here’s the code:

        using(var session = store.OpenSession())

                {

                    session.Advanced.UseOptimisticConcurrency = true; 

                    var post = new Post()

                                   {

                                       Id = id,

                                       ImportStatus = ImportStatus.ImportStarted,

                                       ImportStartedAtUtc = DateTimeOffset.Now

                                   }; 

                    session.Store(post,null);

                    session.SaveChanges();

                }

     

  3. Send a message using our service bus to actually perform the import – Though not related to RavenDB directly, see Jimmy Bogard’s post on how to use messaging patterns to interop transactionally with non-transactional systems (in my case, the CMS B APIs do not participate in a distributed transaction, so we had another source of dupes when message failures were retried after a transaction rollback).

Steps 1-3 are wrapped in a distributed transaction. When a simultaneous duplicate notification occurs, step 2 will fail for all but one of the notifications. All the failed transactions get rolled back and dupes no longer show up in CMS B.

 

Discuss this post

You're in Easy Mode. If you prefer, you can use XHTML Mode instead.
As a new user, you may notice a few temporary content restrictions. Click here for more info.