Leveraging SQL Server’s XML Support to Improve Application Performance

Share on FacebookTweet about this on TwitterShare on LinkedIn

Share on FacebookTweet about this on TwitterShare on LinkedIn

You’re performing a unit test for a page that has controls to pick multiple options, like assigning access to multiple users to a specific piece of data, before clicking the “save” button which commits the changes to the database. The test passes for one user and it works perfectly. You select 3 or 4 users and it’s still snappy. You select 50 users and try again. And wait. And wait. Unfortunately, the moment of truth has arrived; the performance issue lurking in the code has just been exposed!

segue-blog-SQL-Server-XML-Support

 

It’s likely that the problem will be traced to a loop in the code that iterates over each selected item performing some database operations. It’s inconsequential when done once or twice, but when done twenty or more times it becomes a problem. Fortunately, SQL Server 2005 introduced several tools that allow you to bundle up a large set of data, transmit it via XML, and then manipulate it in the database without looping.

A Database-centric Solution

To demonstrate, assume some type of access control list is being updated. Access can be read, written, removed, and published and users can have multiple access rights allowed per data item. The graphic below gives a simplified ERD with sample data:

a simplified ERD with sample data.

To utilize SQL Server’s XML, first modify the code’s loop to produce XML instead of initiating a database call:

//  The following is JavaScript code from within SmartClient, but
//  the programming language is irrelevant.  This creates xml 
//  containing multiple row items with the three data elements 
//  required for populating the ACL table.
var xmlChanges = xmlChanges + "’;
for ( var intRowPtr = 0; intRowPtr < this.getTotalRows(); intRowPtr++ ) {
   var row = this.getEditedRecord( intRowPtr );
   xmlChanges = xmlChanges + "";
   xmlChanges = xmlChanges + "" + row["USERID"] + "";
   xmlChanges = xmlChanges + "" + this.dataId + "";
   xmlChanges = xmlChanges + "" + row["ACCESSID"] + "";
   xmlChanges = xmlChanges + "";
}

So here’s where the real work begins. When moving the database manipulations to the database, a stored procedure will be utilized to make the appropriate changes. First the standard creation and parameter lines:

Create Procedure ACLUpdate
   @data Xml
As
Begin
   Set NoCount On

Note the @data parameter is of type XML.

Next, a temporary table is defined to store the ACL information. In this case a memory table is used instead of a traditional #temp table. For the size of datasets that are expected, the memory table should work well. However, a physical temp table can be indexed for better performance if needed.

   Declare @ACLUpdates Table
   (
      DataId Integer,
      UserId Integer,
      AccessId Integer
   )

Now insert the data into the temp table. It validates that each input field has a correlating record in the appropriate table. It also accepts null values which will be covered later.

   Insert Into @ACLUpdates
   (
      DataId,
      UserId,
      AccessId
   )
   Select
      inputXML.row.value('(DataId)[1]', 'Integer') As 'DataId',
      inputXML.row.value('(UserId)[1]', 'Integer') As 'UserId',
      inputXML.row.value('(AccessId)[1]', 'Integer') As 'AccessId'
   From
      @data.nodes('/row') As inputXML(row)
      Inner Join dbo.Data On
         Data.DataId = inputXML.row.value('(DataId)[1]', 'Integer')
         Or inputXML.row.value('(DataId)[1]', 'Integer') Is Null
      Inner Join dbo.[User] On
         [User].UserId = inputXML.row.value('(UserId)[1]', 'Integer')
         Or inputXML.row.value('(UserId)[1]', 'Integer') Is Null
      Inner Join dbo.Access On
         Access.AccessId =
         inputXML.row.value('(AccessId)[1]', 'Integer')
         Or inputXML.row.value('(AccessId)[1]', 'Integer') Is Null

After the conversion from XML to a table is done, there are two basic steps to complete. First, remove any ACL records that are missing from the input XML, then insert the records that were passed in. The removal of records is, by far, the more complicated of the two steps:

   Delete ACL
   Output
      'deleted' As [operation],
      deleted.*
   From
      dbo.ACL
      --  Get matches by DataID / UserId, but get all AccessIds
      Inner Join @ACLUpdates As ACLUpdates On
         IsNull( ACLUpdates.DataId, ACL.DataId ) = ACL.DataId
         And IsNull( ACLUpdates.UserId, ACL.UserId ) = ACL.UserId
      --  ACLIgnore determines if ACLUpdate records that
      --  should NOT be removed.
      Left Outer Join @ACLUpdates As ACLIgnore On
         IsNull( ACLIgnore.DataId, ACL.DataId ) = ACL.DataId
         And IsNull( ACLIgnore.UserId, ACL.UserId ) = ACL.UserId
         And ACLIgnore.AccessId = ACL.AccessId
   Where
      -- Anything that does NOT have a match in
      -- ACLIgnore will be removed.
      ACLIgnore.AccessId Is Null
      And
      (
         ACLUpdates.UserId Is Not Null
         Or ACLUpdates.DataId Is Not Null
      )

The Inner Join (aliased as ACLUpdates) gets all existing ACL records that match by UserId and DataId, but ignores the AccessId for those records.

The Left Outer Join (ACLIgnore) is used to strip out records that will be added in the next step. There’s no advantage to deleting the records and adding them back in.

Note the use of IsNull within both the Joins on the @ACLUpdates table. This allows the stored procedure to remove all users for a particular Data record or all access to all Data records for a particular user. The final And in the Where clause prevents you from deleting all records from the table.

The Output is included for debugging purposes. It will display the results of your deletion.

Next, insert the requested records into the ACL table:

 Insert Into dbo.ACL
      (
         DataId,
         UserId,
         AccessId
      )
      Output
         'inserted' As [operation],
         inserted.*
      Select
         ACLUpdates.DataId,
         ACLUpdates.UserId,
         ACLUpdates.AccessId
      From
         @ACLUpdates As ACLUpdates
         Left Outer Join
         (
            Select
               ACL.DataId,
               ACL.UserId,
               ACL.AccessId
            From
               dbo.ACL
               Inner Join @ACLUpdates As ACLUpdates On
                  ACL.DataId = ACLUpdates.DataId
                  And ACL.UserId = ACLUpdates.UserId
                 And ACL.AccessId = ACLUpdates.AccessId
         ) As dExistingACLRecords On
            dExistingACLRecords.DataId = ACLUpdates.DataId
            And dExistingACLRecords.UserId = ACLUpdates.UserId
            And dExistingACLRecords.AccessId = ACLUpdates.AccessId
      Where
         ACLUpdates.AccessId Is Not Null
         And dExistingACLRecords.DataId Is Null
End  -- ACLUpdate

It’s safe to assume there is a uniqueness constraint across all three fields in the ACL table (UserId, DataId, and AccessId). So the Left Outer Join prevents you from adding a record that already exists. This check is done, since it’s entirely possible another user has added an identical record before you clicked the “save” button in the application.

Also note that the Left Outer Join uses a derived query. It is highly likely that an ACL table would contain a large number of records, potentially numbering in the hundreds of thousands or millions. Any outer join on such a large table often results in a table scan even though you’re joining on indexed fields. The derived query’s Select pares it down the record set to only those that will be affected by your changes, which will probably 10 or less records.

Finally, all that’s left to do is call the stored procedure with your XML:

--  Load up some test data.
Execute ACLUpdate
      '20750011
       20050013
       20050014
       20050021
       20050033
       20050032
       21750031
       22550031
       29250031
       22350044
       22350043
       22350042
       34470061
       34481231
       34477411
       29663201'
Go
--  UserId 207 - add "Write", "Delete", and "Publish" acces.  Remove “Read”.
Execute ACLUpdate
      '20750012
       20750013
       20750014'
Go
--  UserId 207 - only gets "Write" and "Delete" Access.  Remove “Publish”.
Execute ACLUpdate
      '20750012
       20750013'
Go
--  This removes all access for user 207 to dataId 5001
--  * NOTE: The AccessId attribute has been removed.  It will be
--          null when it is entered into the @ACLUpdate table.
Execute ACLUpdate '2075001'
Go

Now the handling of large data sets has been moved to the database which, if handled correctly, is much better suited for the task. While the above example is specific to an access control list, it’s easy to tweak it for any loop that updates the database.

Need Help? Contact us