• Raven DB: Lessons Learned: Caching Contexts

    Caching

    When you talk about caching in terms of the the full web application stack, you've typically got the following layers:

    • Browser cache
    • CDN cache
    • Application output cache
    • Data cache

    However, in a application leveraging Raven DB, the last layer actually gets split up into two layers.

    Some Background

    The way Raven DB operates is by having the client generate HTTP requests, which are sent across the wire to the server. Therefor, the same standard caching mechanisms that HTTP provides are present. This means, that if a request is made, and Raven DB thinks the data hasn't changed since the last time you requested that same data, then the server responds with HTTP 304 Not Modified, instructing the raven client to continue to use what it got last time.

    using (var session = store.OpenSession())
    {
        //if the server doesnt have anything different from the last time this was requested,
        //it wont do any processing, and just return HTTP 304 Not Modified to the client. the
        //client will then use what it got last time.
        var foo = session.Load("foos/123");
    }
    

    So the first layer of the data cache, you get for free out of the box with Raven. Fortunately the second layer is available as well, if your application needs it.

    Aggressive Data Caching

    With Raven DB, it's possible to instruct the client to not even ask the server for data again, thereby skipping the HTTP request, even if it might result in a 304. Here's what that looks like:

    using (var session = store.OpenSession())
    {
        //set up an aggressive caching context, instructing the server to not
        //make an http request if it made one within the last 5 seconds
        using (session.Advanced.DocumentStore.AggressivelyCacheFor(TimeSpan.FromMinutes(5)))
        {
            var foo = session.Load("foos/123"); //may or may not make a request
        }
    }
    

    Runtime Configuration?

    We made mention in a previous blog post about a runtime configuration setup that we've provided our ops team with. Having the ability to control the TTL on the Raven runtime configuration seems like a prime candidate to use with this. We wired up the runtime configuration much the same as the output caching runtime configuration from the other blog post.

    Clever

    Now, to use output caching, it was a simple line to apply the [ConfiguredOutputCache] attribute to our controller actions. However, with the raven data caching, it's a violation of DRY to have to open an aggressive caching context, and pass in a runtime configuration value everywhere it's needed. So, with that in mind, we came up with an extension method to encapsulate that behavior. We thought this was very clever, but it actually turned out to be quite stupid. Can you spot the problem?

    public static class DataCachingExtensions
    {
        public static T LoadAndCache(this IDocumentSession session, string id)
        {
            using (session.Advanced.DocumentStore.AggressivelyCacheFor(CacheSettings.RavenAggressiveCachingDurationSeconds))
            {
                return session.Load(id);
            }
        }
    
        public static IRavenQueryable QueryAndCache(this IDocumentSession session)
        {
            using (session.Advanced.DocumentStore.AggressivelyCacheFor(CacheSettings.RavenAggressiveCachingDurationSeconds))
            {
                return session.Query();
            }
        }
    }
    
    ... 
    
    session.LoadAndCache("foos/123");
    
    ...
    
    session.QueryAndCache().Where(f => f.Bar == "Baz").ToList();
    
    

    The first extension method is fine, but the 2nd one doesn't do anything at all. Why?

    It's because Raven doesn't actually execute the HTTP query until it's evaluated. So since we haven't actually executed the query, and have returned from inside the aggressive caching context, the context was disposed before we ever execute the HTTP query, resulting in no caching.

    So after feeling pretty silly, we restructured the extension method to simply return the aggressive caching context, so that the caller can encapsulate the full query including its execution.

    public static class DataCachingExtensions
    {
        public class NonCachingContext : IDisposable
        {
            public void Dispose() { }
        }
    
        public static IDisposable GetCachingContext(this IDocumentSession session)
        {
            if(CacheSettings.RavenAggressiveCachingDurationSeconds == 0)
            {
                return new NonCachingContext();
            }
    
            return session.Advanced.DocumentStore.AggressivelyCacheFor(CacheSettings.RavenAggressiveCachingDurationSeconds);
        }
    }
    
    ...
    
    using (var session = store.OpenSession())
    {
        using (session.GetCachingContext())
        {
            session.Query().Where(f => f.Bar == "Baz").ToList();
        }
    }
    

    It's worth pointing out that the caching context, when used with a Query, does not actually cache the items returned, but rather just caches the query/response aspect, so subsequent cache-enabled calls to .Load for items that were returned from a cache-enabled query context will still make a request take place, if they weren't already cached by a .Load call themselves.

    Happy coding!

    Show more
  • Performance on-demand; Giving your ops team runtime flexibility

    Performance On Demand

    Pretend you are in an operations position, in which your job is to maintain the infrastructure that routes traffic and the servers that serve requests. Wouldn't it then be nice, if you suddenly had a surge in traffic or a drop in available server hardware (be it expected or unexpected), you could alter the performance characteristics of your web applications?

    This is a problem we've been tackling with our new set of web apps, and we think we've got a pretty good solution in place.

    Operations Administration Panel for Runtime Configuration

    For starters, we've created an administration web application for our operations folks, whose primary purpose is that of runtime configuration. Operations can control various aspects of our systems from this application, including:

    • Logging levels
    • Caching TTLs
    • Database masters/slaves and replication strategies
    • Application settings
    • Network locations for editorial assets
    • Logical service bus participants
    • Etc.

    When any of these settings are updated, we send a message on the service bus informing subscribers of changes in the settings they care about. Let's analyze the one we made reference to above, which will provide ops with a way to dial in performance on demand.

    Output Caching

    In the administration panel, we've provided a settings page where the output caching TTL and data caching TTL can be set for a given application. When this setting is updated, we publish a message on the service bus, which our front end rendering ASP.NET MVC application can subscribe to.

    Creating a handler in the rendering application then is pretty easy. We listen for the settings type that corresponds to caching:

        public class CacheSettingsUpdater : SettingsChangedHandler
        {
            protected override bool ShouldHandle(string id)
            {
                return string.Equals(
                    id,
                    CacheSettingsData.StorageId,
                    StringComparison.OrdinalIgnoreCase);
            }
    
            protected override void Update(CacheSettingsData settingsData)
            {
                CacheSettings.UpdateSettings(settingsData);
            }
        }
    

     

    As you can see, the handler can then inform a settings class by calling its "UpdateSettings" method, who keeps a reference to the latest data.

        public static class CacheSettings
        {
            private static CacheSettingsData Data = new CacheSettingsData();
    
            public static int OutputCacheDurationSeconds
            {
                get
                {
                    return
                        Data.CurrentAppCacheParameters
                            .OutputCacheDurationSeconds;
                }
            }
    
            internal static void UpdateSettings(CacheSettingsData data)
            {
                Data = data;
            }
        }
    

     

    Leveraging it with OutputCacheAttribute

    Now, in ASP.NET MVC, there is an action filter for output caching: OutputCacheAttribute. This attribute can be applied at the controller level, or at the individual action level. When an action is run the first time, the framework will cache the result, such that the next request won't require processing again, and will be delivered from cache. The cached item will be delivered from cache until the TTL/Duration expires. The effect of this is that your application won't be processing for every request, and will be able to therefor serve more requests.

    The issue with connecting up our runtime configuration class from above (CacheSettings) to the OutputCacheAttribute, is that the settings for a filter can only be specified by constants, like so:

        [OutputCache(Duration = 10)]
        public ActionResult Index()
    

     

    So, we need to instead create our own action filter, which inherits from OutputCacheAttribute, so we can control where it gets its values from. I've simplified this for brevity to just illustrate the Duration extensibility point.

        public class ConfiguredOutputCacheAttribute : OutputCacheAttribute
        {
            public new int Duration
            {
                get { return base.Duration; }
                set
                {
                    throw new NotSupportedException(
                        "Duration cannot be set directly. " +
                        "Set from runtime config.");
                }
            }
    
            public ConfiguredOutputCacheAttribute()
            {
                base.Duration = CacheSettings.OutputCacheDurationSeconds;
            }
    
            public override void OnActionExecuting(
                ActionExecutingContext filterContext)
            {
                base.Duration = CacheSettings.OutputCacheDurationSeconds;
                base.OnActionExecuting(filterContext);
            }
        }
    

     

    As you can see, when we hit OnActionExecuting, we check the CacheSettings class for the current output cache duration, and set it on the base OutputCacheAttribute class we inherited from. The effect of this, is that during day to day traffic, operations can control the cache TTL.

    Then we just apply it where we want to cache:

        [ConfiguredOutputCache]
        public ActionResult Index()
    

     

    Well, how did we do?

    Let's see what it looks like if I simulate light traffic load. The red line indicates request execution time.

    Turning on output caching results in a dramatic drop off in request execution time.

    The dramatic drop off occurred when I went into the operations administration panel and changed the TTL. The spikes every 10 seconds following the drop off are when the cache duration TTL expired, forcing the page to actually process again.

    There are a number of things we can do to enhance the flexibility of this system. For example, we could specify groupings in the operations administration panel that correspond to cache policies, and then simply specify on each instance of our attribute which policy we'd like to use:

        [ConfiguredOutputCache(CachePolicy = "FooCachePolicy")]
        public ActionResult Index(string streamSlug)
    

     

    We think this feature will be particularly valuable in situations where we need more performance on demand, and look forward to extending it to have more flexibility as needed.

    Happy coding!

  • Battling the Fallacies of Distributed Computing with RavenDB

    Recently, I deployed some code that had the following requirements:

    • When post {x} is first published in CMS A, import a summary of {x} into CMS B
    • When {x} is subsequently updated and re-published in CMS A, do nothing

    Seemed pretty simple.  Due to the limited API support in CMS B, I used RavenDB to maintain a record of posts I had already imported from CMS A to CMS B in order to honor the second requirement.

    Worked on my machine

    Immediately after deploying, I basked for about 30 seconds in the praise from our editors. Moments later, I started receiving reports of duplicate posts showing up in CMS B. I was flabbergasted. I had been careful to handle the dupe scenario in code.

    I checked the code again. By design, the code prevents dupes from being created in CMS B…. unless the duplicates arrived less than a few milliseconds apart. Fail.

    When I looked at the server logs, the duplicate notifications were indeed happening less than a millisecond apart. For a few minutes, I thought about how I might prevent the duplicate publish notifications. Ultimately, I embraced the first two fallacies of distribute computing instead:

     

    1. The network is reliable
    2. Latency is zero

     

    Duplicate notifications arriving a couple of milliseconds apart are a fact of life. Deal with it.

    Raven etags and concurrency control to the rescue

    One of my favorite aspects of Raven is that it’s ACID when you need it, BASE when you don’t. Here’s how we made it really ACID-y to solve the duplicate import problem:

    1. When a notification comes in, check Raven to see if we’ve already imported the post.
    2. If the post has never been seen, create a new document in Raven with a null etag and using optimistic concurrency. Used this way, the Raven client will throw an exception if anyone else tries to create the same document. Here’s the code:

          using(var session = store.OpenSession())

                  {

                      session.Advanced.UseOptimisticConcurrency = true; 

                      var post = new Post()

                                     {

                                         Id = id,

                                         ImportStatus = ImportStatus.ImportStarted,

                                         ImportStartedAtUtc = DateTimeOffset.Now

                                     }; 

                      session.Store(post,null);

                      session.SaveChanges();

                  }

       

    3. Send a message using our service bus to actually perform the import – Though not related to RavenDB directly, see Jimmy Bogard’s post on how to use messaging patterns to interop transactionally with non-transactional systems (in my case, the CMS B APIs do not participate in a distributed transaction, so we had another source of dupes when message failures were retried after a transaction rollback).

    Steps 1-3 are wrapped in a distributed transaction. When a simultaneous duplicate notification occurs, step 2 will fail for all but one of the notifications. All the failed transactions get rolled back and dupes no longer show up in CMS B.