Monday 3 September 2012

SharePoint 2010: List Throttling, Indexing and ContentIterator


One of the new features Microsoft added to SharePoint 2010 is List Throttling. I am not going to explain it, you can find the details here.

The limit set for non-administrator users (5000) is a small number and it often gets crossed. Now as soon as this number of items goes beyond 5000, CAML queries to such lists stop working and start throwing SPQueryThrottledException even though they are returning single item.

There are many ways to override List throttling behavior like increase the list throttling threshold or disable list throttling for such large list, but they are not recommended. Hence farm administrator does not allow them to happen.  So how to get our queries working? Well, there are two approaches which apply to two different scenarios.

First scenario: Number of items returned by CAML is less than List throttling limits
In this case, indexing of the fields used in where clause and setting the RowLimit (<list threshold) to SPQuery object makes CAML queries working. You can find nice explanation on how indexing solves the problem here. It says list threshold is nothing but number of records being processed by the query. As after indexing records are stored in sorted order, setting the RowLimit ensures number of records processed are less than list threshold value, thus avoids throttling exception.

Second scenario: Number of items returned by CAML are more than List throttling limits
In this scenario, alone indexing does not solve the problem. ContentIterator needs to be used which has capability to fetch the records in batches (pages) less than that of list threshold. If you have worked in MOSS 2007 then you must be aware of SPQuery.ListItemCollectionPosition which helps to fetch large number of items efficiently in the batches. ContentIterator.ProcessListItems method make use of SPQuery.ListItemCollectionPosition internally in a such way that its value is less than the list threshold value. Hence when used along with Indexing, ContentIterator successfully fetches large number of items without throwing SPQueryThrottledException. The number of records processed (set using SPQuery.ListItemCollectionPosition) is always less than list threshold value.

However we need to tell ContentIterator explicitly to make use of indexing by appending ContentIterator.ItemEnumerationOrderByNVPField order by clause in CAML query. Failing to do so, as Contentiterator sets SPQuery.QueryThrottleMode to SPQueryThrottleOption.Strict internally, list threshold of normal users will be applied even for administrators.

I carried out some tests on a list having 10000 records with normal account. Here are the results.

Test 1: Query a non-index column
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Contains><FieldRef Name=\"Title\" /><Value Type='Text'>Annoucement 1</Value></Contains></Where>";
           
//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach without indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach without indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing without indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing without indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing without indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing without indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 1: Test 1 result

Test 2: Query a indexed column, ContentIterator.ItemEnumerationOrderByNVPField order by clause and less than 5000 results
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Contains><FieldRef Name=\"IndexedTitle\" /><Value Type='Text'>Annoucement 1</Value></Contains></Where>" + ContentIterator.ItemEnumerationOrderByNVPField;

//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
   this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 2 : Test 2 result

Test 3: Query a indexed column, ContentIterator.ItemEnumerationOrderByNVPField order by clause and more than 5000 results
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Contains><FieldRef Name=\"IndexedTitle\" /><Value Type='Text'>Annoucement</Value></Contains></Where>" + ContentIterator.ItemEnumerationOrderByNVPField;

//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 3 : Test 3 result

Note that ContentIterator individual as well batch processing worked fine without any problem and returned number of records more than that of list threshold value. This proves that both approaches for ContentIterator overcome list throttling issues. Here ContentIterator fetches the data into pages using SPQuery.ListItemCollectionPosition attribute. Please see the internal code for ContentIterator.ProcessListItems method for better understanding below.

query.ListItemCollectionPosition ="Row limit set in code(default value=200)";
do
{
    SPListItemCollection items = list.GetItems(query);
    int count = items.Count; num++;
    try {
        itemsProcessor(items);
        this.OnProcessedListItemsBatch(str1, items, num, count);
    }
    catch (Exception exception1)
    {
        Exception exception = exception1;
        if (errorCallout == null || errorCallout(items, exception))
        { throw; }
    }
    if (!this.ShouldCancel(IterationGranularity.Item))
    {
        query.ListItemCollectionPosition = items.ListItemCollectionPosition;
    } else { return; }
} while (query.ListItemCollectionPosition != null);


Test 4: Query a indexed column, NO ContentIterator.ItemEnumerationOrderByNVPField order by clause, more than 5000 results and Administrator user
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Contains><FieldRef Name=\"IndexedTitle\" /><Value Type='Text'>Annoucement</Value></Contains></Where>";

//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 4 : Test 4 result

As user used is an administrator and number of items in list is less than 20k, traditional approach worked fine but ContentIerator failed. This test proves that when ContentIterator is used, even for administrators, list threshold limits of normal users will be applied. This is because ContentIterator sets the SPQuery.QueryThrottleMode to SPQueryThrottleOption.Strict internally.

Test 5: Query a indexed lookup column, ContentIterator.ItemEnumerationOrderByNVPField order by clause, more than 200 and less than 5000 results
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Eq><FieldRef Name='LookupColumn2' LookupId='True' /><Value Type='Lookup'>1</Value></Eq></Where>" + ContentIterator.ItemEnumerationOrderByNVPField;

//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 5 : Test 5 result

The traditional approach returned correct number of items. ContentIterator batch processing also returned desired number of items because the row limit set for it (2000) is more than expected results (1000).
However, ContentIterator individual processing returned 200 items only which is wrong. This behavior is reflected only when expected query is supposed to return number of items between 200 and 5000 (list throttling threshold). If you note I have not set the row limit for SPQuery object. If not set explicitly, ContentIterator internally set it as 200 and hence is the result count.

Test 6: Query a indexed lookup column, ContentIterator.ItemEnumerationOrderByNVPField order by clause and more than 5000 results
Code:
SPList announcementsList = SPContext.Current.Web.Lists["Announcements"];
string strCAMLQuery = "<Where><Eq><FieldRef Name='LookupColumn1' LookupId='True' /><Value Type='Lookup'>1</Value></Eq></Where>" + ContentIterator.ItemEnumerationOrderByNVPField;

//Traditional Approach
try
{
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    SPListItemCollection itemCollection = announcementsList.GetItems(spQuery);
    items = itemCollection.Count;
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("Traditional approach with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator Individual processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, spQuery, ProcessIndividualItem, ProcessIndividualError);
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Individual Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));

//ContentIterator batch processing
try
{
    items = 0;
    var spQuery = new SPQuery();
    spQuery.Query = strCAMLQuery;
    ContentIterator iterator = new ContentIterator();
    iterator.ProcessListItems(announcementsList, strCAMLQuery, (uint)2000, true, ProcessItemsInBatch, ProcessErrorsInBatch);
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + items.ToString() + "<br/>"));
}
catch (Exception ex)
{
    this.Controls.Add(new LiteralControl("ContentIterator Batch Processing with indexed lookup column in Where clause: - " + ex.Message + "<br/>"));
}
this.Controls.Add(new LiteralControl("<br/>"));


Outcome:
Figure 6 : Test 6 result
In this case, both ContentIterator approaches failed to return correct number of records.

Test 5 and Test 6  proves that ContentIterator when used along with indexed lookup column returns the items fetched in the first page set using row limit (see the ContentIterator code provided in Test 3 above). This is clearly a bug to me. I am on SP1 + Aug CU. It should either return correct number of records or should throw the exception. I have added this into forums here. Let us see what MS people says about it.


Myths proven wrong and key take aways:
1.   Individual processing does not solve list throttling for more than 5000 records: This is a myth. See the outcome of Test 3.
2.   Beaware of lookup columns in where clause: See the outcome of Test 5 and Test 6.
3.   ContentIterator 100% solves the list throttling issues : This is again a myth. ContentIterator has limitations. It does not support all queries. I found that for complex CAML queries having joins on multiple lists (I used a query joining 6 lists), content iterator thrown an exception even after indexing is done properly on columns present in where clause. To understand why it thrown an exception, I executed same query using SPQuery traditional approach using admin credentials and found that it too failed. I then removed the ContentIterator.ItemEnumerationOrderByNVPField clause in query and executed it again using traditional approach and this time it worked. Hence I think that there is something wrong happening when we use ContentIterator.ItemEnumerationOrderByNVPField order by clause in a complex join query because of which both traditional as well as ContentIterator approach fails. Note that this clause works perfectly in simple join queries. I will add this into forums. May be someone will be able to explain why complex join queries with ContentIterator.ItemEnumerationOrderByNVPField order by clause fails.
4.   ContentIterator is slow due to its nature of fetching the records into pages.
5.   ContentIterator has to be used along with ContentIterator.ItemEnumerationOrderByNVPField. Otherwise for administrators, normal list throttling threshold will be applied. Due to this restriction, custom order by clause cannot be applied. You will have to write code for custom sorting once ContentIterator returns the results.

Hopefully this post helps people facing list throttling exceptions. You can find the sample source code here.

I am used to capture happenings in my life into my posts so that I will remember them. The happenings for this post are the office awards I received in last two weeks.
1.   Most Valuable Player – from my account
2.   Trailblazer: Pat on the back – from my practice.