RavenDB: Lessons Learned: Query Includes and Projections

By default, RavenDB will only allow 30 requests per session. This is part of RavenDB's "Safe by default" behaviors, to prevent you from making a giant number of RavenDB HTTP requests, which would be a performance quagmire.

Let's say you have an object graph that you are retrieving from RavenDB that contains referenced documents, and it looks like this:

stories/123
{
  "Headline": "New iPad is key to Apple's bottom line",
  "Author": "Jack Smith",
  "LastPublishedAtUtc": "2012-03-14T23:48:00.0000000+00:00",
  "PublishStatus": "Published",
  "StoryReferences":
  {
    "storyreference/456213":
    {
      "Headline": "New iPhone coming soon",
      "Author": "John Doe"
    }
    "storyreference/789654":
    {
      "Headline": "New iPad foils reviewers' attempts to find legitimate faults",
      "Author": "Jane Doe"
    }
    "storyreference/555111":
    {
      "Headline": "Now on Netflix: Search by TV network",
      "Author": "Jack Smith"
    }
    "storyreference/942342":
    {
      "Headline": "Apple stores to open at 8am for iPad launch",
      "Author": "John Doe"
    }
    ...
  }
}

And let's say you are interested in getting a small subset of data about the referenced stories for display with the base story. What you DON'T want to do is something like this:


var story = session.Load("stories/123");

foreach(var storyReference in story.RelatedStories)
{
    var otherStory = session.Load(storyReference.Id);
    // ... do something with otherStory ...
}

That will result in the following HTTP traffic back to Raven:

  1. Make a request for 'story/123'
  2. Make a request for 'story/456213'
  3. Make a request for 'story/789654'
  4. Make a request for 'story/555111'
  5. Make a request for 'story/942342'
  6. ...etc...

You'll consume unnecessary bandwidth and incur the cost of individual HTTP requests. What you really want to do is have the client make a single HTTP request. Fortunately RavenDB allows you to do that with Includes. A RavenDB include says "Hey server, go get this for me, but before you give it back to me, gather up these other things and return them with the request too so I can deal with them in a moment".

A few weeks back we had some code that was hitting the 30 requests per session limit. At first we couldn't understand why, since we do a pretty good job of making sure we only make 1 or 2 requests via Includes. Upon further inspection, it turned out we had misunderstood something about the RavenDB client API.

What's the problem?

If we have an index that produces projections, in which it produces a server side anonymous entity containing flattened "StoryReferenceIds", like this (this is a contrived example):


public class Stories_ByReferencedStories : AbstractIndexCreationTask
{
    public class Result
    {
        public string Headline { get; set; }
        public DateTimeOffset? LastPublishedAtUtc { get; set; }
        public IEnumerable StoryReferenceIds { get; set; }
    }

    public Stories_ByReferencedStories()
    {
        this.Map = stories => from story in stories
                              select new
                              {
                                  Headline = story.Headline,
                                  LastPublishedAtUtc = story.LastPublishedAtUtc,
                                  StoryReferenceIds = story.StoryReferences.Select(x => x.Id),
                              };
    }
}

... Then we had previously done something like the following on our Lucene queries against it:


session.Advanced.LuceneQuery()
  .WhereStartsWith("Headline", text)
  .OrderBy("-LastPublishedAtUtc")
  .Include("StoryReferenceIds")

However, it turns out that last Include line doesn't do anything at all. The Include() call actually operates on the entries identified by the index, NOT the projection. In other words, the stories produced from the query are what the Include() call actually operates against.

So, with that in mind, what we actually want is something like this:


.Include("StoryReferences,Id");

The syntax with the comma may look a little funny, but what it means is "For the StoryReferenceIds entities collection, Include the document identified by the Id property from each referenced document". So if you had a story with 45 referenced stories in it, instead of making 46 requests back to Raven, you would make only 1 request. That's much better.

Happy coding.

Discuss this post

You're in Easy Mode. If you prefer, you can use XHTML Mode instead.
As a new user, you may notice a few temporary content restrictions. Click here for more info.